Exploratory Data Analysis
compounds measurements over time by treatment
The following plots include of all compounds against time,
distinguished by color according to their respective groups. To achieve
a comprehensive understanding, we generated scatterplots for compounds
across three distinct matrices—namely, whole blood, oral fluid, and
breath. This analysis encompasses various timepoints and considers
different treatments, namely, placebo, low dose, and high dose.
Upon close examination of the scatterplots, a noteworthy observation
emerges, particularly concerning the THC biomarker in
whole blood. This specific biomarker appears to offer a potentially
enhanced indication of recent cannabis joint usage. The scatterplot
reveals a discernible separation between the placebo
and THC treatment groups, suggesting that the
THC measurement in whole blood may serve as a more
reliable indicator of recent cannabis joint consumption.
scatter_WB <- map(compounds_WB, ~ compound_scatterplot_group_by_treatment(
dataset=WB,
compound=.x,
timepoints=timepoints_WB))








scatter_OF <- map(compounds_OF, ~ compound_scatterplot_group_by_treatment(
dataset=OF,
compound=.x,
timepoints=timepoints_OF))







scatter_BR <- map(compounds_BR, ~ compound_scatterplot_group_by_treatment(
dataset=BR,
compound=.x,
timepoints=timepoints_BR))

In the presented set of scatterplots, all compounds are graphically
depicted against time, with color distinctions denoting different
treatment conditions and a log transformation applied to the y-axis,
which represents the respective compound measurements. A comparative
analysis with the previous scatterplots reveals a modification:
specifically, a log transformation has been applied to the y-axis,
providing an alternative perspective on the measurement of the
compounds.
Upon closer examination, a notable observation emerges. The
measurement of THC from breath exhibits a more
discernible separation between the placebo and
THC treatment groups in the log-transformed
scatterplots. This suggests that the log transformation on the y-axis
enhances the visibility of distinctions between the treatment conditions
for THC. The log transformation, by compressing the
scale, may unveil nuances and patterns that are not as apparent on a
linear scale. This nuanced insight into THC
measurements underscores the importance of considering the impact of
transformation techniques when analyzing compound data over time in the
context of different treatments. The enhanced separation observed in the
log-transformed scatterplots could potentially provide valuable insights
into the effects of treatments on THC levels and
underscores the sensitivity of the chosen visualization approach.
scatter_WB_by_treatment <- map(compounds_WB, ~ compound_scatterplot_group_by_treatment_log(
dataset=WB,
compound=.x,
timepoints=timepoints_WB))








scatter_OF_by_treatment <- map(compounds_OF, ~ compound_scatterplot_group_by_treatment_log(
dataset=OF,
compound=.x,
timepoints=timepoints_OF))







scatter_BR_by_treatment <- map(compounds_BR, ~ compound_scatterplot_group_by_treatment_log(
dataset=BR,
compound=.x,
timepoints=timepoints_BR))

deleting compounds that obviously do not work from the compound data
frame WB: cbd, thccooh, thccooh_gluc, thcv OF:thcoh
compounds_WB = compounds_WB[- c(2, 5, 6, 8)]
compounds_OF = compounds_OF[- c(4)]
Calculating sensitivity and specificity.
output_WB <- map_dfr(compounds_WB,
~ sens_spec_cpd(
dataset = WB,
cpd = all_of(.x),
timepoints = timepoints_WB
)) |> clean_gluc()
output_BR <- map_dfr(compounds_BR,
~ sens_spec_cpd(
dataset = BR,
cpd = all_of(.x),
timepoints = timepoints_BR
)) |> clean_gluc()
output_OF <- map_dfr(compounds_OF,
~ sens_spec_cpd(
dataset = OF,
cpd = all_of(.x),
timepoints = timepoints_OF
)) |> clean_gluc()
cutoff vs. sensitivity/specificity
Here we plot the value of the cutoff against sensitivity and
specificity for every compound in every matrix, and arrange them all
into one big plot. This is also known as the ROC curve of sensitivity
and specificity against cutoff values suggests an exploration of optimal
cutoff points. Overall, the specificity of all compounds increases when
detection limit rises. On the other hand, sensitivity drops to zero when
detection limit rises.
#arranges ss plots into one
ss_bottom_row <-
plot_grid(
ss_OF,
ss_BR,
labels = c('B', 'C'),
label_size = 12,
ncol = 2,
rel_widths = c(0.66, .33)
)
plot_grid(
ss_WB,
ss_bottom_row,
labels = c('A', ''),
label_size = 12,
ncol = 1
)

####Average sensitivity and specificity vs. detection limit
output_WB_avg = average_sens_spec(output = output_WB)
output_OF_avg = average_sens_spec(output = output_OF)
output_BR_avg = average_sens_spec(output = output_BR)
ss_WB_avg_together <-
ss_plot_avg_together(output_WB_avg, tpts = length(unique(output_WB$time_start)), tissue = "Blood")

ss_OF_avg_together <-
ss_plot_avg_together(output_OF_avg, tpts = length(unique(output_WB$time_start)), tissue = "Oral Fluid")

ss_BR_avg_together <-
ss_plot_avg_together(output_BR_avg, tpts = length(unique(output_WB$time_start)), tissue = "Breath")

It should be apparent that OF-THC is the superior choice. now we dig
deeper into OF-THC and find the specific cutoff. referring back to the
Average sensitivity and specificity vs. detection limit plot, we see
that the detection limit is at…very close to 0 when both sensitivity and
specificity are high. Let’s try out some more cutoffs close to 0.
i will now remove every compound where the average sens and spec does
not intersect. reasoning: for compounds with no intersection, optimal
sensitivity (left most point of the graph) = worst specificity. there is
no room for adjustment because any adjustment from there on would just
make everything worse.
compounds_WB = c("thc")
compounds_OF = c("thc")
compounds_BR = NULL
sensitivity vs. specificity
In this visual representation, we graph the sensitivity against
specificity for each compound within every matrix, consolidating the
data into a comprehensive plot. This collective visualization allows for
a convenient comparison of the performance of various biomarkers
concerning their specificity and sensitivity.
output_WB <- map_dfr(compounds_WB,
~ sens_spec_cpd(
dataset = WB,
cpd = all_of(.x),
timepoints = timepoints_WB
)) |> clean_gluc()
output_OF <- map_dfr(compounds_OF,
~ sens_spec_cpd(
dataset = OF,
cpd = all_of(.x),
timepoints = timepoints_OF
)) |> clean_gluc()
#plot sensitivity vs. specificity
roc_WB = roc_plot(output_WB, tpts = length(unique(output_WB$time_start)), tissue = "Blood")

roc_OF = roc_plot(output_OF, tpts = length(unique(output_OF$time_start)), tissue = "Oral Fluid")

# #arrange roc plots
# roc_bottom_row <-
# plot_grid(
# roc_OF,
# roc_BR,
# labels = c('B', 'C'),
# label_size = 12,
# ncol = 2,
# rel_widths = c(0.66, .33)
# )
# plot_grid(
# roc_WB,
# roc_bottom_row,
# labels = c('A', ''),
# label_size = 12,
# ncol = 1
# )
It should be apparent that OF-THC is the superior choice. now we dig
deeper into OF-THC and find the specific cutoff. referring back to the
Average sensitivity and specificity vs. detection limit plot, we see
that the detection limit is at…very close to 0 when both sensitivity and
specificity are high. Let’s try out some more cutoffs close to 0.
plot sensitivity and specificity over time given specific
cutoffs
Taking a deeper dive into sensitivity and specificity over time over
time for the measurement of THC in Blood
and Oral Fluid tissues. In direct comparison between
the two measurement methods of THC, it becomes evident that
Oral Fluid outshines its counterpart in terms of both
sensitivity and specificity, particularly within the critical time span
of three hours post-smoking.
#pass specific cutoff into splits parameter
OF_THC <- sens_spec_cpd(
dataset = OF,
cpd = 'thc',
timepoints = timepoints_OF,
splits = c(0.5, 1, 2, 5, 10)
) |> clean_gluc()
of_levels <- c("pre-smoking\nN=192", "0-30\nmin\nN=192", "31-90\nmin\nN=117",
"91-180\nmin\nN=99", "181-210\nmin\nN=102", "211-240\nmin\nN=83",
"241-270\nmin\nN=90", "271+\nmin\nN=76")
plot_cutoffs(dataset=OF_THC,
timepoint_use_variable=OF$timepoint_use,
tissue="Oral Fluid",
cpd="THC",
x_labels=NULL)
## [[1]]

##
## [[2]]
## # A tibble: 40 × 18
## TP FN FP TN detection_limit compound time_start time_stop
## <dbl> <dbl> <int> <int> <fct> <chr> <dbl> <dbl>
## 1 0 0 35 157 0.5 THC -400 0
## 2 0 0 20 172 1 THC -400 0
## 3 0 0 9 183 2 THC -400 0
## 4 0 0 0 192 5 THC -400 0
## 5 0 0 0 192 10 THC -400 0
## 6 129 0 39 24 0.5 THC 0 30
## 7 129 0 30 33 1 THC 0 30
## 8 128 1 19 44 2 THC 0 30
## 9 128 1 3 60 5 THC 0 30
## 10 125 4 1 62 10 THC 0 30
## # ℹ 30 more rows
## # ℹ 10 more variables: time_window <fct>, NAs <int>, N <int>, N_removed <int>,
## # Sensitivity <dbl>, Specificity <dbl>, PPV <dbl>, NPV <dbl>,
## # Efficiency <dbl>, my_label <fct>
the average sensitivity is a lot more sensitive (ha) to change than
the average specificity - specificity only dips in the 31-90min window
when the cutoff is lowered, whereas a lower cutoff increases overall
sensitivity all across the board, no matter the time. additionally, this
31-90min window where the specificity is heavily effected by a low
cutoff is, in my opinion, trivial. it should be quite apparent that
someone is high if they smoked within the last 90 min. a lowered
specificity in this time frame isnt cause for too much concern,
considering how much sensitivity is gained via using a low cutoff.
in a nutshell: a low cutoff is optimal. approxiamately somewhere
between 0-2. let’s test more cutoffs in this range:
OF_THC <- sens_spec_cpd(
dataset = OF,
cpd = 'thc',
timepoints = timepoints_OF,
splits = c(0.1, 0.25, 0.5, 1, 1.5)
) |> clean_gluc()
blood_levels <- c("pre-smoking\nN=189", "0-30\nmin\nN=187", "31-70\nmin\nN=165",
"71-100\nmin\nN=157", "101-180\nmin\nN=168", "181-210\nmin\nN=103",
"211-240\nmin\nN=127", "241-270\nmin\nN=137", "271-300\nmin\nN=120",
"301+\nmin\nN=88")
of_levels <- c("pre-smoking\nN=192", "0-30\nmin\nN=192", "31-90\nmin\nN=117",
"91-180\nmin\nN=99", "181-210\nmin\nN=102", "211-240\nmin\nN=83",
"241-270\nmin\nN=90", "271+\nmin\nN=76")
plot_cutoffs(dataset=OF_THC,
timepoint_use_variable=OF$timepoint_use,
tissue="Oral Fluid",
cpd="THC",
x_labels=NULL)
## [[1]]

##
## [[2]]
## # A tibble: 40 × 18
## TP FN FP TN detection_limit compound time_start time_stop
## <dbl> <dbl> <int> <int> <fct> <chr> <dbl> <dbl>
## 1 0 0 37 155 0.1 THC -400 0
## 2 0 0 37 155 0.25 THC -400 0
## 3 0 0 35 157 0.5 THC -400 0
## 4 0 0 20 172 1 THC -400 0
## 5 0 0 12 180 1.5 THC -400 0
## 6 129 0 44 19 0.1 THC 0 30
## 7 129 0 44 19 0.25 THC 0 30
## 8 129 0 39 24 0.5 THC 0 30
## 9 129 0 30 33 1 THC 0 30
## 10 128 1 23 40 1.5 THC 0 30
## # ℹ 30 more rows
## # ℹ 10 more variables: time_window <fct>, NAs <int>, N <int>, N_removed <int>,
## # Sensitivity <dbl>, Specificity <dbl>, PPV <dbl>, NPV <dbl>,
## # Efficiency <dbl>, my_label <fct>
they all look pretty promising… we need a way to quantify this. I am
gonna calculate the sensitivity and specificity for cutoff values in
between 0 and 2.
output_OF = sens_spec_cpd_OFTHC(
dataset = OF,
cpd = "thc",
timepoints = timepoints_OF
) |> clean_gluc()
output_OF_avg = average_sens_spec(output = output_OF)
output_OF_avg
## # A tibble: 101 × 4
## compound detection_limit average_sensitivity average_specificity
## <chr> <dbl> <dbl> <dbl>
## 1 THC 0 0.956 0
## 2 THC 0.02 0.956 0.817
## 3 THC 0.04 0.956 0.817
## 4 THC 0.06 0.956 0.817
## 5 THC 0.08 0.956 0.817
## 6 THC 0.1 0.956 0.817
## 7 THC 0.12 0.956 0.817
## 8 THC 0.14 0.956 0.817
## 9 THC 0.16 0.956 0.817
## 10 THC 0.18 0.956 0.817
## # ℹ 91 more rows
lets plot this really quick
ss_OF_avg_together <-
ss_plot_avg_together(output_OF_avg, tpts = length(unique(output_WB$time_start)), tissue = "Oral Fluid")

oh we found it. the place where they intersect is the maximum of
sensitivity+specificity. lets get the specific value
output_OF_avg |>
filter(abs(average_sensitivity-average_specificity) < 0.01) |>
mutate(diff = abs(average_sensitivity-average_specificity))
## # A tibble: 5 × 5
## compound detection_limit average_sensitivity average_specificity diff
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 THC 0.82 0.893 0.890 0.00338
## 2 THC 0.84 0.893 0.890 0.00338
## 3 THC 0.86 0.893 0.890 0.00338
## 4 THC 0.88 0.893 0.890 0.00338
## 5 THC 0.9 0.893 0.890 0.00338
at cutoff 0.82-0.90, the difference between the average sensitivity
and average specificity is minimized. we’ll pick 0.85 for aestheicism’s
sake.